-
Notifications
You must be signed in to change notification settings - Fork 9
Adding snapshotting functionality for MergingTree
#602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding snapshotting functionality for MergingTree
#602
Conversation
3d07f52 to
e5acb06
Compare
dcoutts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good.
I think we should rebase this after #588 since there's some related churn to the run/merging run/incoming run types and code.
We need to review the reference handling pattern. It should be callee duplicates when it retains references not caller. We should avoid functions that consume a reference / take ownership of a reference.
And I think we should extend this within the same PR to include the integration into the top level src/Database/LSMTree/Internal.hs, since otherwise we're not able to integrate it into the SnapshotMetaData type as we expect to do.
e5acb06 to
3acc5da
Compare
|
Rebased on #588. |
| -> SnapMergingRunState t (Ref (Run m h)) | ||
| -> m (Ref (MR.MergingRun t m h)) | ||
| getSnapMergingRunState nr md mc smrs = do | ||
| let ln = LevelNo 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. This is fishy. We should figure out proper level numbers. Perhaps that means we need to include them in the snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this problem has "moved" from the old call site, to the definitions of
toSnapMergingTreeStatetoSnapPreExistingRun
The thing is, I don't know where to get the "correct" LevelNo from. The two call sites supplying LevelNo 0 to the getSnapMergingRunState function involve converting the MergingTreeState and PreExistingRun constructors:
SnapOngoingTreeMergeSnapPreExistingMergingRun
I can't find any reasonable place form within the MergingTree data-type to extract a LevelNo from. Should the supplied LevelNo for the MergingTree be one grater than the number of Levels in the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, after rebasing on main the problem site has moved again. The problem now manifests in fromSnapMergingRunState and is annotated with a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, after rebasing on
mainthe problem site has moved again. The problem now manifests infromSnapMergingRunStateand is annotated with a TODO.
Is this still a live issue? I don't see the TODO on fromSnapMergingRunState. And in the latest #608 there is no need for a LevelNo in fromSnapMergingRunState. So I hope this is now resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wasn't clear enough about this. So in fromSnapMergingRunState there is this definition:
--TODO: the threshold should be stored with the MergingRun
-- here we want to supply the credits now, so we can use a threshold of 1
let thresh = MR.CreditThreshold (MR.UnspentCredits 1)
_ <- MR.supplyCreditsAbsolute mr thresh mergeCredits
return mr
The "correct" way to create a CreditThreshhold, from what I can intuit from the exposed APIs, is to call Database.LSMTree.Internal.MergeSchedule.creditThresholdForLevel, and that function requires a LevelNo. That is what I meant by saying that the problem call site has just moved to the fromSnapMergingRunState definition. Is there another, better way to set the CreditThreshhold?
600b813 to
94d9211
Compare
|
I believe that, at long last, this is ready to merge. |
94d9211 to
837e891
Compare
|
Looks like a minor rebase mistake (one of my patches appears twice, the second with minimal content). Removing this second one still works fine so I'll just force push it. |
Thanks for doing that! |
22e9d5a to
52899f1
Compare
|
Phew, I finally got that |
52899f1 to
ae3d37f
Compare
Use utils for maybe, list and vector. And use a single-item encoding for Maybe, rather than it being a pair of things. The pair style means one has to remember to count it as two when setting the tuple lengths.
e1fbf2a to
ca1a591
Compare
Adding snapshotting functionality for
MergingTreeSummary
This PR contains the snapshotting functionality for the
MergingTreedata-type. In order to create a snapshot of aMergingTree, additional prerequisite functionality is also included:Encode.DecodeVersioned.As with creating the "metadata" snapshot, Intermediate "
Snap-" types are created to media the conversion between a disk-representable format and the in-memory representation.This PR includes test cases for:
Snap- data-types'sArbitraryinstances.SnapshotMergingTreedata to detect breakage of the format's encoding.SnapshotMergingTreein memorySnapshotMergingTreeto and from disk viafs-api.Potential issues:
While the
SnapshotMergingTreedata successfully round-trips to/from disk, I am not certain that conversion from aSnapshotMergingTreeback to the in-memory representation ofMergingTreeis correct. TheDatabase.LSMTree.Internal.Snapshot.fromSnapMergingTreefunction defines this functionality, but I'm not sure I understand the usage ofRefandRefcounterwell enough to ensure that the conversion is done correctly. Careful review of this function and/or suggestions for testing this conversion.The functionality needs to be integrated into the
Database.LSMTree.Internal.createSnapshotfunction after the questions below is resolved.Currently, the
MergingTreedata snapshots are separate from theSnapshotMetadata, with each data-type having their own duplicate functions:-
writeFileSnapshotMergingTree-
readFileSnapshotMergingTree-
encodeSnapshotMergingTree-
decodeSnapshotMergingTreeThe functionality for creating snapshots of the
MergingTreedata-type was created separately for ease of debugging and testing. However, now that the functionality is stable, this separation may be undesirable. It appears thatcreateSnapshotis an "atomic" operation for the wholeTableand anyMergingTreewithin theTableshould be serialized together within theSnapshotMetadata.Questions
Should we have the
SnapshotMergingTreebe an entry within theSnapshotMetadatarecord, effectively combining the two types?Does every use case for creating a
SnapshotMetadatasnapshot necessitate snapshots of anyMergingTree(s)?Does there exist a use case to snapshot a
MergingTreewithout also creating a SnapshotMetadata` snapshot?